We are migrating the bug tracker to github Issues. This is now the preferred way to report NASM bugs.

Self-registration is disabled due to spam issue (mail gorcunov@gmail.com or hpa@zytor.com to create an account)

Bug 3392950 - OMF output: Segment references with an adjustment wrongly processed, losing the adjustment
Summary: OMF output: Segment references with an adjustment wrongly processed, losing t...
Status: CLOSED FIXED
Alias: None
Product: NASM
Classification: Unclassified
Component: Assembler (show other bugs)
Version: 3.00.xx
Hardware: All All
: Medium severe
Assignee: H. Peter Anvin
URL:
Depends on:
Blocks:
 
Reported: 2025-09-03 07:45 PDT by E. C. Masloch
Modified: 2025-09-04 05:09 PDT (History)
5 users (show)

Obtained from: Built from git using configure
Generated by: Human
Bug category: Incorrect main output
Observed for: Production code
Regression: Yes (specify version below)
Regression since:
git 7d5e549d6385ffef050d830317d7663e88d2986e


Attachments
Possible solution patch (1.67 KB, patch)
2025-09-03 10:52 PDT, H. Peter Anvin
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description E. C. Masloch 2025-09-03 07:45:11 PDT
The NASM in ~/proj/nasmrc is 03490692b0082fe16a1936a5774f4326a55075e6

The one in ~/proj/nasm is 7a5502142b735fe62866963fa0bf3182808996b2

The other two revisions are in directories named after their commit IDs. It appears that the fix to https://bugzilla.nasm.us/show_bug.cgi?id=3392949 introduced this bug, in the commit https://github.com/netwide-assembler/nasm/commit/21c977e717d7ecea03275810dcc11c082d4f20f0

In the tests, both uses of a segment value with an adjustment are encoded as 0000h in the object file's LEDATA16 records, where we expect 0010h for the one and FFC6h for the other.

test$ cat test.asm
DOSENTRYADJUSTSEGMENT equ 60h - 26h
DOSENTRYADJUSTOFFSET equ DOSENTRYADJUSTSEGMENT * 16
DOSENTRYDEVICEBASE equ 10h
%idefine PTR

        section DOSENTRY

dw      AUXDEV2 - DOSENTRYDEVICEBASE * 16, seg AUXDEV2 + DOSENTRYDEVICEBASE

AUXDEV2:
ENTRYPOINT:

MOV     WORD PTR [ENTRYPOINT+3], DOSENTRY - DOSENTRYADJUSTSEGMENT
test$ ~/proj/nasmrc/nasm test.asm -fobj -o testrc.obj                              test$ ~/proj/nasm/nasm test.asm -fobj -o testnew.obj                               test$ ~/proj/omfdump/omfdump testrc.obj > testrc.txt                               test$ ~/proj/omfdump/omfdump testnew.obj > testnew.txt                             test$ diff -u testrc.txt testnew.txt                                               --- testrc.txt  2025-09-03 16:27:09.961394125 +0200
+++ testnew.txt 2025-09-03 16:27:15.453514907 +0200
@@ -1,10 +1,9 @@
 80 THEADR       10 bytes, checksum 3F (valid)
    0000: 08 74 65 73 74 2e 61 73-6d                       :  .test.asm
-88 COMENT       36 bytes, checksum E7 (valid)
+88 COMENT       33 bytes, checksum 7E (valid)
    [NP=0 NL=0 UD=00] 00 Translator
-   0002: 20 54 68 65 20 4e 65 74-77 69 64 65 20 41 73 73  :   The Netwide Ass
-   0012: 65 6d 62 6c 65 72 20 32-2e 31 36 2e 30 32 72 63  :  embler 2.16.02rc
-   0022: 32                                               :  2
+   0002: 1d 54 68 65 20 4e 65 74-77 69 64 65 20 41 73 73  :  .The Netwide Ass
+   0012: 65 6d 62 6c 65 72 20 32-2e 31 37 72 63 30        :  embler 2.17rc0
 96 LNAMES       11 bytes, checksum DF (valid)
    [0001] ''
    0000: 00 08 44 4f 53 45 4e 54-52 59                    :  .
@@ -17,9 +16,9 @@
 88 COMENT        4 bytes, checksum 91 (valid)
    [NP=0 NL=1 UD=00] A2 Link pass separator
    0002: 01                                               :  .
-a0 LEDATA16     14 bytes, checksum A5 (valid)
+a0 LEDATA16     14 bytes, checksum 7A (valid)
                 segment 'DOSENTRY', offset 0000
-   0000: 04 ff 10 00 c7 06 07 00-c6 ff                    :  ..........
+   0000: 04 ff 00 00 c7 06 07 00-00 00                    :  ..........
 9c FIXUPP16     17 bytes, checksum D7 (valid)
    FIXUP  segment-relative, type 1 (16-bit offset)
           record offset 0000
test$ ~/proj/nasmtest/21c977e717d7ecea03275810dcc11c082d4f20f0/nasm test.asm -fobj -o test21c9.obj
test$ ~/proj/nasmtest/7d5e549d6385ffef050d830317d7663e88d2986e/nasm test.asm -fobj -o test7d5e.obj
test$ ~/proj/omfdump/omfdump test21c9.obj > test21c9.txt                           test$ ~/proj/omfdump/omfdump test7d5e.obj > test7d5e.txt
test$ diff -u testrc.txt test7d5e.txt                                              --- testrc.txt  2025-09-03 16:27:09.961394125 +0200
+++ test7d5e.txt        2025-09-03 16:29:45.500812211 +0200
@@ -1,10 +1,10 @@
 80 THEADR       10 bytes, checksum 3F (valid)
    0000: 08 74 65 73 74 2e 61 73-6d                       :  .test.asm
-88 COMENT       36 bytes, checksum E7 (valid)
+88 COMENT       36 bytes, checksum E5 (valid)
    [NP=0 NL=0 UD=00] 00 Translator
    0002: 20 54 68 65 20 4e 65 74-77 69 64 65 20 41 73 73  :   The Netwide Ass
    0012: 65 6d 62 6c 65 72 20 32-2e 31 36 2e 30 32 72 63  :  embler 2.16.02rc
-   0022: 32                                               :  2
+   0022: 34                                               :  4
 96 LNAMES       11 bytes, checksum DF (valid)
    [0001] ''
    0000: 00 08 44 4f 53 45 4e 54-52 59                    :  .
test$ diff -u testrc.txt test21c9.txt
--- testrc.txt  2025-09-03 16:27:09.961394125 +0200
+++ test21c9.txt        2025-09-03 16:29:35.712597242 +0200
@@ -1,10 +1,10 @@
 80 THEADR       10 bytes, checksum 3F (valid)
    0000: 08 74 65 73 74 2e 61 73-6d                       :  .test.asm
-88 COMENT       36 bytes, checksum E7 (valid)
+88 COMENT       36 bytes, checksum E5 (valid)
    [NP=0 NL=0 UD=00] 00 Translator
    0002: 20 54 68 65 20 4e 65 74-77 69 64 65 20 41 73 73  :   The Netwide Ass
    0012: 65 6d 62 6c 65 72 20 32-2e 31 36 2e 30 32 72 63  :  embler 2.16.02rc
-   0022: 32                                               :  2
+   0022: 34                                               :  4
 96 LNAMES       11 bytes, checksum DF (valid)
    [0001] ''
    0000: 00 08 44 4f 53 45 4e 54-52 59                    :  .
@@ -17,9 +17,9 @@
 88 COMENT        4 bytes, checksum 91 (valid)
    [NP=0 NL=1 UD=00] A2 Link pass separator
    0002: 01                                               :  .
-a0 LEDATA16     14 bytes, checksum A5 (valid)
+a0 LEDATA16     14 bytes, checksum 7A (valid)
                 segment 'DOSENTRY', offset 0000
-   0000: 04 ff 10 00 c7 06 07 00-c6 ff                    :  ..........
+   0000: 04 ff 00 00 c7 06 07 00-00 00                    :  ..........
 9c FIXUPP16     17 bytes, checksum D7 (valid)
    FIXUP  segment-relative, type 1 (16-bit offset)
           record offset 0000
test$
Comment 1 E. C. Masloch 2025-09-03 07:45:51 PDT
omfdump is from https://github.com/boeckmann/omfdump/
Comment 2 H. Peter Anvin 2025-09-03 07:58:07 PDT
Ah, nice... I may want to "steal" that one for NASM as well.
Comment 3 H. Peter Anvin 2025-09-03 10:52:03 PDT
Created attachment 411943 [details]
Possible solution patch
Comment 4 H. Peter Anvin 2025-09-03 10:52:28 PDT
I have attached a patch which I think might be able to resolve this problem. Do you think you could test it out?
Comment 5 E. C. Masloch 2025-09-03 11:03:29 PDT
I was about to push a new revision of lDOS built using the 2025 August git NASM. Luckily, I compared all four files (instsect.com, format.exe, share.exe, msbiow.exe) using my ident86 tool. The differences in msbiow.exe turned out to include this bug's traces of zeroes rather than the adjustments for segment references. The new kernel build also, it turns out, failed to boot.

Created using this command: ~/proj/ident86/ident86.py -s aug/msbiow.exe sep/msbiow.exe sep/msbio.tls sep/msbiow.map | tee msbio.txt

Result uploaded to https://pushbx.org/ecm/test/20250903/ident/msbio.txt
Comment 6 E. C. Masloch 2025-09-03 11:05:07 PDT
(In reply to H. Peter Anvin from comment #4)
> I have attached a patch which I think might be able to resolve this problem.
> Do you think you could test it out?

Yes, I tested it. I didn't build the kernel files yet but the test1.exe from https://bugzilla.nasm.us/show_bug.cgi?id=3392949 and the test.asm from this report appear to both work with that.
Comment 7 H. Peter Anvin 2025-09-03 11:11:01 PDT
I have checked in this fix.

I'm leaving this as PENDING/FIXED for now; I will close it formally after you verify it solves your problem.
Comment 8 E. C. Masloch 2025-09-03 11:22:58 PDT
(In reply to H. Peter Anvin from comment #7)
> I have checked in this fix.
> 
> I'm leaving this as PENDING/FIXED for now; I will close it formally after
> you verify it solves your problem.

The ident86 report on the msbiow.exe file is as follows (minus verbose details):

ident86 version: hg 6c01457081ae
lDebug version: "lDebug (2025-03-09)"
Number of files: 4
File 1: [...]/build-dl-wwwecm/msdos4/src/BIOS/msbiow.exe
File 2: src/BIOS/msbiow.exe
Trace listing file: src/BIOS/msbio.tls
WarpLink map file: src/BIOS/msbiow.map
Not merged map ranges:
Merged map ranges:
MZ executable header detected, size = 512 bytes
EOF1 reached at 80592 bytes
EOF2 reached at 80592 bytes
Files are the same length (80592 bytes)
Amount different bytes: 55
Amount different lines: 0
Amount not different ranges: 39

The kernel also boots. I compared instsect.com, format.exe, and share.exe as well and they all seem to be identicalised (only "no difference" ranges).
Comment 9 E. C. Masloch 2025-09-03 11:24:17 PDT
I assume that NASM changed its encoding choices "fingerprint" by accident. It does present challenges in comparing files.
Comment 10 H. Peter Anvin 2025-09-03 11:51:35 PDT
This, unfortunately, is often a result of internal changes as new formats are supported.  However, if there are specific things that are giving you a headache, at least let me know so I can see if it is (a) not a bug and (b) trivially addressable...
Comment 11 E. C. Masloch 2025-09-03 12:29:10 PDT
(In reply to H. Peter Anvin from comment #10)
> This, unfortunately, is often a result of internal changes as new formats
> are supported.  However, if there are specific things that are giving you a
> headache, at least let me know so I can see if it is (a) not a bug and (b)
> trivially addressable...

I don't think there's any bugs, because ident86 detected all of them as "no difference". (This means same instruction length + same semantic meaning.) I redid the identicalisation setup that I used to detect this bug, but with your patch applied this time. If you want, you can check every listed byte change to figure out what instruction it is a part of. (I may add an option to ident86 soon to help with that.)

The instsect.com file doesn't have a corresponding .tls file yet because it is an -f bin output file, but ident86 happened to match all instruction boundaries correctly anyway it seems.

The compared binaries, listing files, map files, sorted section files, and full ident86 reports (including verbose details) are found in https://pushbx.org/ecm/test/20250903/ident.new/

To find the instructions corresponding to an ident86 file offset, you have to subtract the MZ header size (200h) then look up the address in the .srt file. Then determine the base of the named section using the .tls file's announcements, like in << === Switch to base=002450h -> "DOSCODECODE" >>, subtract that base from your address, and find the result as the .tls machine code dump offset. To filter out the desired section from the .tls file for easier search, you could use something like the following command:

~/proj/tractest/listvars.pl msbiow.map msbio.tls --filter-section=SYSINITGROUP

The .txt and .srt files were generated using commands like these:

~/proj/ident86/ident86.py -s aug/msbiow.exe sep/msbiow.exe sep/msbio.tls sep/msbiow.map | tee msbio.txt

~/proj/tractest/sortmap.pl msbiow.map --skip-empty --list-align > msbiow.srt
Comment 12 E. C. Masloch 2025-09-03 12:54:22 PDT
Added a -D option to ident86: https://hg.pushbx.org/ecm/ident86/rev/a33e4f6e652d

This is to make it dump the mismatching instructions even if they are semantically the same, in the .log files eg https://pushbx.org/ecm/test/20250903/ident.new/msbio.log

Example:

004B69  first:C3 != second:D8
004B71  first:C3 != second:D8
004B68 up to below 004B81, first=004B69 last=004B71
first:  004B68 +2 xchg al, bl           second: samesame
first:  004B6A +1 inc di                second: samesame
first:  004B6B +3 mov cx, 000A          second: samesame
first:  004B6E +2 rep movsw             second: samesame
first:  004B70 +2 xchg al, bl           second: samesame

The first:/second: lines list the numeric mismatches, while the side-by-side disassembly lists what instructions they are a part of. Both use file offsets as addresses. The plus numbers give the length of each instruction. In this case, NASM encodes xchg al, bl differently from the way it used to.

In instsect.log you can see that it doesn't know where exactly to start disassembly, so you get 16 bytes before the different bytes. And it sometimes guesses instruction boundaries wrong, albeit this is not a major problem here. This is due to the lack of a .tls file.
Comment 13 E. C. Masloch 2025-09-03 13:07:14 PDT
As an example, to find the .tls trace listing position that corresponds to file offset 004B68h, you:

- Subtract the 200h for the MZ header, yielding 4968h.

- Look up 4968h in the sorted .map sections file, https://pushbx.org/ecm/test/20250903/ident.new/sep/msbiow.srt

This is the match:

DOSCODECODE      DOSCODEGROUP   s=04903h l=01D1h a=1   dos/search

- Look up DOSCODECODE section base in the .tls file:

=== Switch to base=002450h -> "DOSCODECODE"

- Subtract the base from 4968h, resulting in 2518h.

- Search for that number as the 8-hexit start offset of a machine code dump, finding https://hg.pushbx.org/ecm/tlsfiles/file/9abe4783ff05/msbio.tls#l79917

     0 00002518 86C3                            XCHG    AL,BL                   ; Search byte to BL, user byte to AL
     0 0000251A 47                              INC     DI
   116                                  ;       STOSB                           ; Store the correct "user" drive byte
   117                                                                          ;  at the start of the search info
     0 0000251B B90A00                          MOV     CX,20/2
     0 0000251E F3A5                            REP     MOVSW                   ; Rest of search cont info, SI -> entry
     0 00002520 86C3                            XCHG    AL,BL                   ; User drive byte back to BL, search
   121                                                                          ;   byte to AL
     0 00002522 AA                              STOSB                           ; Search contin drive byte at end of
   123                                                                          ;   contin info

- Locate source text corresponding to this listing: https://hg.pushbx.org/ecm/msdos4/file/637cfcc5a4d1/src/DOS/search.nas#l114
Comment 14 E. C. Masloch 2025-09-03 13:18:09 PDT
Here's the msbio ident86 report with -D -j -J. This dumps the relevant listing file lines and uses escape sequences to highlight the different bytes in the hexdump. I'd forgotten that the j options could do this.

File at https://pushbx.org/ecm/test/20250903/ident.new/msbio.ext
Comment 15 E. C. Masloch 2025-09-03 13:23:57 PDT
It seems all differences are xchg reg,reg instructions, in share, format, and msbio. Probably in instsect too but due to lacking a .tls file the -j -J options to ident86 cannot work.
Comment 16 E. C. Masloch 2025-09-03 14:37:23 PDT
Retried with new ident86 options -D -Y. The instsect changes are all xchg reg,reg as well.
Comment 17 E. C. Masloch 2025-09-04 03:10:07 PDT
Both NASM and NDISASM swap xchg reg,reg operands (where neither reg is ax).

test$ cat test.asm

        xchg bx, dx
        xchg cx, dx
test$ nasm test.asm -l /dev/stderr
     1
     2 00000000 87DA                            xchg bx, dx
     3 00000002 87CA                            xchg cx, dx
test$ ndisasm test
00000000  87DA              xchg bx,dx
00000002  87CA              xchg cx,dx
test$ ~/proj/nasmtest/patch/nasm test.asm -l /dev/stderr
     1
     2 00000000 87D3                            xchg bx, dx
     3 00000002 87D1                            xchg cx, dx
test$ ~/proj/nasmtest/patch/ndisasm test
00000000  87D3              xchg bx,dx
00000002  87D1              xchg cx,dx
test$ ndisasm test
00000000  87D3              xchg dx,bx
00000002  87D1              xchg dx,cx
test$
Comment 18 E. C. Masloch 2025-09-04 05:09:15 PDT
For a patch fixing the fingerprints refer to https://bugzilla.nasm.us/show_bug.cgi?id=3392951